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The realization that cancer progression required the participation of cellular genes provided one of several key rationales, 
in 1986, for embarking on the human genome project. Only with a reference genome sequence could the full spectrum of 
somatic changes leading to cancer be understood. Since its completion in 2003, the human reference genome sequence 
has fulfilled its promise as a foundational tool to illuminate the pathogenesis of cancer. Herein, we review the key his- 
torical milestones in cancer genomics since the completion of the genome, and some of the novel discoveries that are 
shaping our current understanding of cancer. 



The 40 years' "war on cancer" 

The U.S. declaration of war on cancer, in 1971, was the product of 
heady optimism following the remarkable technological feat of 
landing on the moon. Recombinant DNA cloning was still in its 
infancy and not yet a match for the task of unraveling the full 
complexity of the human cell. Nevertheless, the proof that muta- 
tion of normal genes could cause cancer, a fact we take for granted 
today, led Dulbecco to realize as early as 1985 the value of ob- 
taining the sequence of all the human chromosomes as the 
foundation for understanding cancer (Dulbecco 1986). The con- 
temporaneous discussion about generating the complete human 
sequence represented an astonishing leap since, at the time, se- 
quencing a single gene of about 1 kilobase was worthy of a PhD 
dissertation, and the human genome was 3 million times larger. 
The repeat structure of the human genome was well characterized 
and also viewed by some to be an insurmountable obstacle to such 
an endeavor and grounds for early resistance (Robertson 1986) to 
the project. Despite these obstacles, an international effort led to 
the completion of the human reference genome in 2003 (In- 
ternational Human Genome Sequencing Consortium 2004), and 
with a finished human genome reference in hand, it became 
possible to test in earnest the value of genomic approaches to 
decipher the genetic changes that lead to cancer. The data that 
have emerged since 2003 overwhelmingly support the value of 
this vision and have changed the way cancer is researched and 
understood. 

During the period of the sequencing of the human genome 
(1990-2003) (http://www.ornl.gov/sci/techresources/Human_ 
Genome/project/ about.shtml), cancer researchers continued to 
accumulate knowledge of the basic mechanisms of cancer, and 
using a variety of clever cloning strategies, with steadily improving 
sequencing capabilities, identified the majority of the most potent 
oncogenes and tumor suppressors. An inventory of the genes as- 
sociated with cancer yielded 291 cancer genes based on mutation 
data available in the literature: —1% of the coding sequence 
(Futreal et al. 2004). It was noted that 90% of these genes were 
somatically mutated, 20% germline mutated, and 10% could be 
found in both categories. The division between germline and 
somatic genes is a mysterious dichotomy that remains unex- 
plained in the most current inventory (http://www.sanger.ac. 
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uk/genetics/CGP/Census/). The most common form of varia- 
tion in the 2004 inventory was translocation leading to the 
production of oncogenic fusion proteins. Until 2004, no one 
had studied more than a handful of genes at any one time in 
a single patient. 

That was the state of cancer genomic research at the threshold 
of the "genomic era" of cancer research: an era heralded by the 
availability of the high-quality reference genome, and the dra- 
matic explosion of DNA sequence data fueled by the introduction 
of inexpensive massively parallel sequencing instruments. This 
year is the 10 th anniversary of the completion of that remarkable 
milestone in science — the completion of the reference human 
genome. At this juncture, we recapitulate some of the key findings 
and challenges that have emerged from the sequence analysis of 
the cancer genome. 

The armamentarium 

With base-level resolution of the human reference genome in 
hand, cancer researchers turned to the large-scale study of muta- 
tion, with the promise of generating the entire catalog of muta- 
tions peculiar to a given disease as well as to a single patient. Figure 1 
tracks the development of some of the key technologies, resources, 
and milestones in the development of the present-day armamen- 
tarium of cancer mutation discovery. Massively parallel sequenc- 
ing was introduced by Roche 454 and Illumina in 2004-2006 and 
soon demonstrated the feasibility of sequencing complete normal 
and tumor genomes of exemplar human subjects on both plat- 
forms (Ley et al. 2008; Wheeler et al. 2008). At the time, it appeared 
that the application of whole-genome sequencing to routine re- 
search and clinical diagnosis might be on the horizon. Although 
the use of whole-genome sequence (WGS) is far from routine to- 
day, the results generated so far are lending insight into the po- 
tential of WGS for diagnostic, prognostic, and therapeutic im- 
provement in the treatment of cancer patients. 

Using PCR and dye-terminator sequencing, Vogelstein and 
colleagues amplified and sequenced each coding exon of 18,000 
genes, defined by the human genome sequence, in 11 each of 
breast and colorectal tumors (Wood et al. 2007). This brute force 
whole-exome sequencing (WES) approach afforded for the first 
time a comprehensive view of the mutation profile of each patient, 
which, when summed across patients, revealed the "cancer genes" 
for the patients in the given cohort. In one stroke, the mutation 
profile, composed of recurrently mutated genes, plus a collection 
of one-off mutations belonging to pathways and processes known 
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Figure 1. Major events in a decade of cancer genomics. (Dark blue) Major advances in massively parallel sequencing platforms and targeted en- 
richment technologies; (black) major large-scale projects designed to catalog genomic variations of normal human individuals; (red) cancer genomics. 
(dbSNP) Database of single nucleotide polymorphism; (HapMap) haplotype map of the human genome; (ENCODE) Encyclopedia of DNA Elements; 
(COSMIC) Catalog of Somatic Mutations in Cancer; (TCGA) The Cancer Genome Atlas; (GA) genome analyzer; (CRC) colorectal carcinoma; (WES) whole- 
exome sequencing; (ICGC) International Cancer Genome Consortium; (TSP) tumor sequencing project; (AML) acute myeloid leukemia; (WGS) whole- 
genome sequencing; (OSCC) ovarian small cell carcinoma. 



to be involved in tumorigenesis, were revealed for a cancer. The 
fact that the most frequently mutated genes they observed, APC, 
TP53, and KRAS for colon cancer and TP53 for breast cancer, re- 
capitulated what was already known, validated the approach 
and paved the way for expanded application of genome-scale 
sequencing. 

The introduction of DNA sequence enrichment technologies 
from NimbleGen and Agilent (Albert et al. 2007; Gnirke et al. 2009) 
enabled WES on large scales. WES has additional advantages over 
WGS in that the average depth of coverage is about fivefold greater, 
and the cost of sequencing, data processing and storage are all 
much less. Given the relative tractability of interpreting variation 
in the coding sequence compared to intergenic or intronic muta- 
tions, the period between 2004 and 2013 has seen a profusion of 
tumor types analyzed in large cohorts (100-500 patients), mainly 
by WES (see http://www.sanger.ac.uk/genetics/CGP/cosmic/papers/ 
for a comprehensive listing). WGS for a variety of tumors has also 
been reported and, in spite of the smaller numbers of patients, has 
led to surprising insights into cancer biology, based largely on 
analysis of structural variation in tumor genomes. Using WGS, 
genetic alterations observed in the DNA of the cancer cell span 
six orders of magnitude, from single-base point mutations to 
chromosome-scale amplification, using different modes of se- 
quence analysis (see Chin et al. 2011) available today. 

With these tools in hand, The Cancer Genome Atlas (TCGA) 
(http://cancergenome.nih.gov/), the Cancer Genome Project 
(http://www.sanger.ac.uk/genetics/CGP/), the International Cancer 
Genome Consortium (ICGC) (Hudson et al. 2010), Therapeutically 
Applicable Research to Generate Effective Treatments (http:// 
target.cancer.gov/), and other privately funded large-scale pro- 
jects (Downing et al. 2012) began in earnest to systematically 



catalog all the mutations in a wide variety of adult and pediatric 
cancers (see Garraway and Lander 2013 for a recent tally of large- 
scale projects). 

WGS and WES sequencing have been augmented by cDNA 
sequencing (referred to as RNA-seq) to explore alterations to 
the transcriptome. RNA-seq provides not only gene expression 
levels, but also aberrant splicing, chimeric gene fusion transcripts 
characteristic of cancer cells and expressed somatic mutations 
(Bainbridge et al. 2006; Dong et al. 2009; Maher et al. 2009; Shah 
et al. 2009; Berger et al. 2010; Tuch et al. 2010; Wang et al. 2012). 
Analysis of chromatin modification is in its infancy as applied to 
the cancer cell, but the recent reporting of the ENCODE Project 
Consortium's genome-wide results (The ENCODE Project Con- 
sortium 2012) may provide the tools and technologies to enable 
new approaches. The technology behind DNA sequencing is im- 
proving rapidly in accuracy, cost reduction, and speed, making 
advances in cancer biology and clinical testing, all based on anal- 
ysis of the primary sequence of the tumor genome, an essential 
strategy in the war on cancer. However, the coordinated acquisi- 
tion and integrated interpretation of all this data has been possible 
because of a reference genome for comparison. What have we 
learned so far? 

Mutation frequencies 
By patient 

The median frequency of point mutation varies over more than 
three orders of magnitude across human tumors; within a given 
tumor type, the variation in frequency is about one order of 
magnitude (Fig. 2A). The variation in mutation frequency is 
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Figure 2. Frequencies of somatic mutations in cancer patients. All data represents primary tumors. Only nonsilent mutations (missense, nonsense, 
frameshift, and splice site) were counted. (A) Overall frequencies of somatic mutations. Each black dot represents a tumor. The light blue shaded group 
indicates pediatric tumors, and the deeper blue shaded group indicates adult tumors. Red horizontal lines within each cluster of points indicate median 
value of the mutation frequency of each tumor type. (ALL) Acute lymphoblastic leukemia; (AML) acute myeloid leukemia; (C) carcinoma; (GCT) germ cell 
tumor; (CRC) colorectal carcinoma; (MSI) microsatellite instability; (MSS) microsatellite stable; (POLE) patients with somatic mutation in the nuclease 
(proofreading) domain of the POLE gene. The outlier in the low-grade glioma patient with >1 00 mutations per Mb is also PO/.f-mutated. (B) Frequency 
classification of tumors. The pie charts divide the patients into three groups based on frequency of nonsilent mutation: 0 detectable somatic mutations, less 
than 30, and greater than or equal to 30 for selected representative tumor types (30 mutations represent a frequency of 1 per Mbp in A). The nested 
histograms below the pie charts show the percentage of patients with no significantly mutated genes (SMG, calculated by MutSig, q < 0.1), no cancer 
census genes (CGC), or no mutations at all. The sequencing data for all the pediatric tumors, CRC, and hepatocellular carcinoma were generated at the 
Human Genome Sequencing Center at Baylor College of Medicine. The sequencing data for all other adult tumors were from the TCGA Genome Data 
Analysis Center (https://confluence.broadinstitute.org/display/GDAC/Home). Pediatric AML, ALL, and Wilm's Tumor data were obtained from the 
TARGET project (http://www.targetproject.net/). 



a function of the number of somatic cell divisions prior to ini- 
tiation of the tumor, the exposure to environmental muta- 
gens — notably UV radiation and tobacco leaf by-products — and 
in some cancers, altered fidelity of the tumor DNA replication 
system. At the low end of the range are pediatric cancers, fol- 
lowed by adult leukemia and adult solid tumors. Tumors that 
exceed 10 coding mutations per megabase pair (Mbp) are often 
found deficient in mismatch repair, either through mutation 
or epigenetic silencing of MLH1 or one of the other mismatch 
repair enzymes. Tumors with coding mutation frequencies of 
100 per Mbp or greater are mutated in the exonuclease domain 
of POLE, one of two DNA replicative enzymes of the cells (The 



Cancer Genome Atlas Research Network 2013). These patterns 
may have important implications for clinical testing in that 
colorectal patients with high rates of mutation due to mismatch 
repair (Walther et al. 2009) or replicative dysfunction tend to 
have improved survival compared to their lower mutation rate 
counterparts for the same tumor type. At the other end of the 
scale, many pediatric patients have so few coding mutations 
that DNA sequencing sheds less light on the etiology or prog- 
nosis of their disease. For the broad range of adult cancers with 
intermediate rates, mutation discovery is becoming increasingly 
important in subclassifying disease for prognosis and treatment 
(e.g., Patel et al. 2012). 
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By gene 

The aforementioned cancer gene census (Futreal et al. 2004) has 
given way to COSMIC, the Catalogue of Somatic Mutations in 
Cancer (Forbes et al. 2011). The most recent inventory of cancer 
mutations in COSMIC (http://www.sanger.ac.uk/genetics/CGP/ 
Census/) chronicles nearly 800,000 somatic events. Essentially, 
every gene is mutated at least once, reflecting the explosion of data 
available to researchers. 

Every tumor type has a characteristic collection of fre- 
quently mutated genes — as exemplified by colorectal cancer, 
shown in Figure 3 — which are mutated significantly more often 
than expected, given the tumor's background mutation rate (see 
Fig. 2A). Higher- than-expected mutation frequency is prima facie 
evidence of the important role the mutated genes play in the 
given cancer. The pattern established in the first completed exome 
screening in colorectal and breast cancers (Wood et al. 2007) and 



repeated in every cancer since, is that one to three genes are mu- 
tated in >20% of the tumors. A shoulder on the distribution may 
display several more genes mutated in 10%-20% of the tumors, 
and then infrequently mutated genes form a long tail to ever-de- 
creasing mutation frequency. Adding new low-frequency genes to 
the significantly mutated gene list requires sequencing patients in 
greater numbers. Large-scale projects such as TCGA and the ICGC 
projects aim to sequence 500 patients in each tumor type surveyed, 
with the expectation of collecting a sizeable fraction of the genes 
mutated in the 3%-5% range. Whereas the most potent cancer 
genes, often referred to as cancer "drivers," are small in number, 
the tail of genes mutated at <20% is long and is characteristic of 
a given tumor or tumor subtype (see Fig. 3A,B). 

The genes on the tail of the frequency distribution have great 
importance for the understanding of tumor biology. First, they 
may exhibit redundancy of mutation in a given signaling pathway 
in a single patient, which contributes to our understanding of how 
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Figure 3. Significantly mutated genes and principal cancer pathways deregulated by somatic mutations in human colorectal carcinoma. Patients are 
divided into two groups based on mutation rate. All genes shown are significantly mutated with a false discovery rate of less than 0.1 . (A) Profile 
determined from 1 93 patients with chromosome instable, low mutation rate, disease (see Fig. 2A, CRC MSS). (B) Profile determined from 29 microsatellite 
instable CRC plus 7 PO/.f-mutated patients (see Fig. 2A, CRC MSI and CRC MSS POLE). (C) Principal cancer pathways deregulated by somatic mutation in 
CRC. Alterations are defined by somatic mutations, homozygous deletions, high-level focal amplifications, and, in some cases, by significant up- or down- 
regulation of gene expression (black up-triangle). All genes from Figure 3 except MLK4, GPC6, and EDNRB can be placed in one of the four pathways shown 
here. WNT signaling is disrupted by one or more mutations in 93% of patients; TGFbeta signaling is disrupted in 26% of all patients with a low mutation 
rate and in 94% of patients, and RTK/RAS/PI3K signaling is disrupted in over 80% of patients. (Red) Activated genes; (blue) inactivated genes. Deep red or 
blue are genes on the significantly mutated list from panels A and B. Lighter shaded genes are not mutated significantly in this cohort but contribute to 
pathway disruption in some patients. Panels A and B adapted from Figure 1, and panel Cfrom Figure 4, of The Cancer Genome Atlas Research Network 
(2012a). 
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the pathway is utilized by the cancer cell. These features are again 
illustrated by colorectal cancer (Fig. 3C). APC is the primary driver 
of WNT signaling in this cancer; however, 10 other genes, altered 
in 1%-15% of the patients, also appear to participate in activation 
of this pathway sometimes even in the context of APC mutations. 
That one of the key outputs of the WNT pathway is up-regulation 
of MYC, this mutation pattern suggests ever increasing selection 
for activity of the MYC transcription program within the tumor 
cell, a prediction borne out by transcriptome analysis of these 
patients' tumors (The Cancer Genome Atlas Research Network 
2012a). Second, weaker drivers have often revealed entirely new 
pathways or processes, as discussed below. 

There are also important clinical implications for the low- 
frequency recurrent cancer genes. The tail of the mutation profile 
is often found to be populated with genes that are strong drivers 
in other cancers and for which therapies have already been 
designed or are in the pipeline (Taylor et al. 2012). This has given 
rise to the notion that it might be better to classify cancers based 
on mutation profile for developing treatment strategies rather 
than histopathology — a radical departure from well-established 
clinical standards. On the other hand, some patient's cancers 
have no obvious mutations in any known cancer genes, so muta- 
tional screening alone appears to have limited utility in such tu- 
mors (see Fig. 2B). Every tumor population studied so far appears to 
have a few percent of patients with no obvious cancer genes mu- 
tated, necessitating alternative diagnostic approaches. The proba- 
bility of finding no mutated cancer genes increases with decreasing 
mutation frequency in the tumor. 

The low-frequency genes reveal added complexity for the ap- 
plication of therapies targeted to a specific gene in a pathway in that 
they might have one or more downstream genes already mutated. 
The well-known EGFR inhibitors erlotinib (Tarceva) and gefitinib 
(Iressa) fail in the context of mutation in KRAS in nonsmall cell lung 
cancers (Pao et al. 2005). Redundant mutation in key pathways is an 
increasingly common feature of cancer genomes (Fig. 3C), revealed 
by comprehensive exome or genome sequencing. 

When DNA sequencing is augmented by genomic approaches 
such as copy number analysis, RNA expression, or epigenetic 
profiling, evidence for aberrant silencing or activation of cancer 
genes adds to the list of genes relevant to each cancer. From the 
perspective of translating the rapidly accumulating knowledge of 
mutations to the bedside, the question arises as to how many genes 
clinicians will have to pay attention to for each cancer type and for 
any given patient. A fundamental question is whether the set of 
cancer loci that can participate in the maintenance and survival 
of the cancer cell is bounded, or will we one day have to monitor 
the entire genome to effect a cure? Studies with patient populations 
in the range of 100 to 500 patients (see Ganaway and Lander 2013) 
integrating results of several different platforms suggest that the 
number of genes involved in any given tumor is small. Mutation 
profiles like that shown in Figure 3 identify 15-20 genes mutated. 
Data from copy number analysis add another roughly 20 genes from 
recurrent focal amplifications or deletions. Abenant epigenetic and 
expression profiles reinforce the somatic mutation data and may add 
a few more genes. Based on results so far, it seems likely that the cast 
of genes involved in any single cancer type will be in the neighbor- 
hood of 50-100, as opposed to 500-1000. 

Novel high-frequency cancer genes 

Many important new cancer genes, mutated in 20% or more of the 
patients with common cancers, were absent from the 2004 cancer 



gene census (Futreal et al. 2004). Among the genes emerging from 
large-scale studies, the most impressive advances have provided 
insights into the role of chromatin remodeling in tumorigenesis. 
Isocitrate dehydrogenase 1 and 2, IDH1 and IDH2, were surprising 
additions to the list of cancer drivers emerging from glioblastoma 
multiforme (GBM) among the first whole-exome screens reported 
(Parsons et al. 2008). Both enzymes normally convert isocitrate 
to a-ketoglutarate (a-KG), a cofactor for a-KG dioxygenases, in- 
cluding TET family DNA demethylases, KDM-family histone 
demethylases, and many other proteins (Yang et al. 2012). Mutated 
IDH1/2 both produce 2-oxyglutarate, a structural analog of a-KG 
but potent inhibitor of a-KG-dependent enzymes, the methyl- 
transferases involved in DNA and chromatin methylation. 
These inhibitors, therefore, result in aberrant epigenetic mod- 
ification as well as the potential for deregulation of many other 
cellular pathways. The DNA demethylase, DNMT3A, is mutated 
in 22% of AML patients, suggesting a significant role for tran- 
scriptional regulation through epigenetic modification of DNA 
in the pathogenesis of this cancer (Ley et al. 2010). Mutations 
in this gene are clinically important in AML patients since they 
are associated with poor survival. Since its initial discovery, 
DNMT3A is now recognized as a less frequent, although impor- 
tant, contributor to transformation in nearly all myeloid neoplasms 
(Ley et al. 2010). 

PBRM1, discovered in 41% of clear-cell renal carcinomas, was 
the first member of the large SWI/SNF chromatin remodeling 
complex to be found mutated to high levels in any cancer (Varela 
et al. 2011). Since then, mutations in many of the other SWI/SNF 
components have been steadily accumulating. Disruption in the 
normal function of SWI/SNF in over half the cancer patients is 
inferred for renal, ovarian, hepatocellular, gastric, and other can- 
cers (Shain and Pollack 2013). 

BAP1 carries an enzymatic function that deubiquitinates 
histone H2A as well as other proteins involved in chromatin 
remodeling. It exhibits a high frequency (23%) of inactivating 
mutations in mesothelioma (Bott et al. 2011) and in uveal mela- 
nomas, where 84% of patients at high risk for metastasis (Harbour 
et al. 2010) harbor mutations. It is also mutated in 15% of clear-cell 
renal cell carcinomas, ccRCC, wherein it is anti-correlated with the 
more frequent PBRM1 mutations, noted above (Pena-Llopis et al. 
2012). Inactivating mutation in BAP1 defines a molecular subclass 
of high-grade aggressive tumors, which led the authors to develop 
a clinical immunohistochemical assay for the presence or absence 
of BAP1. Anti-correlation in the mutations of two genes usually 
indicates they are active in the same pathway. In this case, differing 
RNA expression profiles associated with mutations in the two 
genes suggested this was not the case. Instead, the authors con- 
cluded that it is disadvantageous to the tumor to have both genes 
mutated and that the two genes likely affect different epigenetic 
programs. 

This list of new high-frequency genes, added to the cancer 
gene compendium as a direct result of the human reference ge- 
nome and next-generation sequencing, reveals the importance of 
epigenetic regulation for tumorigenesis. 

Novel low-frequency cancer genes 

At the other end of the mutation profile are cancer genes con- 
tributing to 10% or fewer tumors of each cancer type. For most 
adult solid tumors, they constitute a greater number of mutated 
genes, filling in details of key pathways and adding interesting 
insight into the mechanisms of cancer. Perhaps the most in- 
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teresting ones among the novel genes are those involved in RNA 
processing functions. Discovered first in myelodysplastic syn- 
drome, U2AF1, U2AF2, SF3B1, and SRSF2 (Yoshida et al. 2011), 
involved in splice acceptor recognition in the RNA splicing ma- 
chinery, were mutated cumulatively in over 50% of patients. 
Subsequently mutations were found in these and other splicing 
components in other myeloid neoplasms (Wang et al. 2011; 
Makishima et al. 2012). Functional testing of the commonly re- 
current mutation sites in U2AF1 demonstrated that they promote 
exon skipping in in vitro assays (Graubert et al. 2012). U2AF1 was 
recently reported in lung adenocarcinoma at the same frequency 
as seen in AML (Imielinski et al. 2012). Additional RNA binding 
proteins— RBM10 in lung (Imielinski et al. 2012) and DDX3X in 
CML (Wang et al. 2011) — hint at additional contributions to 
cancer in RNA processing pathways. 

Whole-genome sequencing and structural variation 

WGS sequencing is a generalized route to reliable discovery and 
interpretation of mutation in noncoding regions of the genome. 
This fact was driven home dramatically in the recent reports of 
somatic base substitution at one of only two positions in the 
promoter of TERT in over 70% of melanoma patients (Horn et al. 
2013; Huang et al. 2013). Although recurrent mutations in non- 
coding DNA had been reported previously (Chapman et al. 2011), 
they were clustered within predefined conserved regions rather 
than recurrent at the same site and, moreover, were without 
functional follow-up. Both TERT promoter mutations created 
novel ETS transcription factor binding sites, and the mutations 
were demonstrated to up-regulate TERT transcription in in vitro 
lucif erase reporter assays (Huang et al. 2013). This astonishing rate 
of mutation within and between tumors may turn out to be more 
prevalent than mutation of TP53, the most frequently mutated 
gene across all cancers to date. 

Large-scale translocations, deletions, and inversions leading 
to oncogenic fusion proteins or promoter activation of an onco- 
gene seldom involve the coding sequence directly and are there- 
fore not visible in WES data; although there has been limited 
success in capturing known fusion junctions in cohorts expected 
to harbor them when the target can be restricted to a small interval 
(Lipson et al. 2012). WGS analyses have dramatically increased our 
knowledge of the number of translocations and deletions giving 
rise to gene fusions, so the list of these events is still growing. 
Among the 487 genes in the current cancer gene census (http:// 
www.sanger.ac.uk/genetics/CGP/Census/), 326 are observed in 
chimeric protein fusions generated through translocation. Like 
point mutations, most translocations observed in most individual 
tumors seem to be random and do not lead to apparent functional 
changes in oncogenes or tumor suppressors. However, across 
populations of AML (The Cancer Genome Atlas Research Network 
2013), lung adenocarcinoma (Imielinski et al. 2012), and lung 
squamous cell carcinoma (The Cancer Genome Atlas Research 
Network 2012b), nonfunctional fusions involving tumor sup- 
pressors contribute significantly to the overall mutation spectrum 
in these cancers. 

Analysis of paired reads from WGS data led to the discovery of 
chromothripsis (Stephens et al. 2011), the result of a catastrophic 
shattering of one or more chromosomes followed by mis joining of 
the scrambled fragments upon repair, and kataegis, the localized 
bursts of point mutations often correlated with somatic structural 
variation. The mechanisms for neither of these processes is clearly 
understood, although it is interesting to note that chromothripsis- 



like events have been observed in normal human germline DNA, 
at translocation break points, and are associated with congenital 
deformities and mental retardation (Kloosterman et al. 2011; 
Chiang et al. 2012). Cancer cells are apparently utilizing a mecha- 
nism for recovering from catastrophic error that is available in 
normal cells, suggesting the possibility that the extremes of chro- 
mothripsis seen in some tumors have accumulated over several cell 
divisions rather than in a single event. In any case, the associa- 
tion of extensive rearrangement with key tumor suppressors and 
oncogenes in some patients suggests that chromothripsis may 
sometimes mediate tumorigenesis and associates with clinical 
outcome (Magrangeas et al. 2011; Molenaar et al. 2012; Hirsch 
et al. 2013; Malhotra et al. 2013). 

Genetic heterogeneity of tumors 

Although the existence of intra-tumor phenotypic heterogeneity 
has been recognized from the early days of experimental cancer 
research, the intra-tumor genetic heterogeneity has not been well 
demonstrated until recently with the significant advances of high- 
throughput next-generation sequencing technologies. In the past 
two years, researchers have characterized in detail the population 
structures and evolutionary paths in a variety of primary and 
metastatic cancers, including childhood acute lymphoblastic leu- 
kemia (Anderson et al. 2011) and secondary acute myeloid leuke- 
mia (Walter et al. 2012), human pancreatic cancers (Campbell et al. 
2010; Yachida et al. 2010), breast cancers (Geyer et al. 2010; Navin 
et al. 2010, 2011), renal clear-cell carcinomas (Gerlinger et al. 
2012), and metastatic medulloblastoma (Wu et al. 2012). Inves- 
tigations have been done using both the spatially separated cancer 
samples (Gerlinger et al. 2012) and the regionally separated cancer 
samples (Navin et al. 2010; Yachida et al. 2010). Genetically dis- 
tinguishable subpopulations in a mixture of cancer cells were 
well demonstrated by single-nucleus and single-cell sequencing 
(Navin et al. 2011; Hou et al. 2012; Xu et al. 2012). Overwhelming 
evidence shown at the DNA level and the RNA level has revealed 
the coexistence of genetically divergent clonal subpopulations 
within tumors. Intra-tumor heterogeneity has been suggested 
as a common feature in both solid tumors and hematological 
malignancies. 

Recent studies have evaluated the clonal relationships 
among primary and metastatic cancers (Campbell et al. 2010; 
Yachida et al. 2010; Navin et al. 2011; Gerlinger et al. 2012). 
These studies showed that seeding metastases require few, if any, 
additional driver mutations beyond those found in the primary 
tumors. Continuous acquisition of mutation and clonal ex- 
pansion during the development of a tumor results in an as- 
semblage of subclones that can be represented by a branching 
evolutionary tree. The metastatic cells seem to emerge from small, 
seemingly arbitrary subclones in the original tumor. Surprisingly, 
different regions of a single tumor may harbor different muta- 
tions in the same driver gene, suggesting a process of convergent 
evolution in the development of the tumor (Gerlinger et al. 
2012). The implications for clinical treatment of tumors are 
daunting since this suggests that genomic analysis from needle 
biopsies may yield a biased view of the genetic landscape of the 
tumor (Gerlinger and Swanton 2010; Anderson et al. 2011; 
Sequist et al. 2011). Yap et al. (2012) suggest that the genetic 
heterogeneity observed within most tumors, combined with the 
polygenic nature of drug resistance, may hamper both the dis- 
covery of predictive biomarkers and the development of anti- 
tumor molecular therapeutics. 
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Challenges for the future 

These exciting insights have only come in the past several years as 
the technologies involved in sequence capture and sequencing 
have matured (Fig. 1), enabling sequencing centers and consortia 
to more effectively exploit the utility of the reference genome. 
Here, we have focused on novel discoveries. The wealth of in- 
formation accumulated so far has illuminated the roles of all 
cancer genes in the biology and pathogenesis of the disease, but 
there is much more to come, along with new challenges. 

There now exist data sets from hundreds of cancer sequencing 
projects for which most of the focus of the studies from those 
projects was on the components we know best how to analyze. 
Thus, our analysis is biased toward variants whose function we 
readily understand — the low-hanging fruit — even if the sequence 
data itself is unbiased. Although much progress has been made, 
there is still considerable potential for discovery in existing data; 
however, the downstream functional analyses are overwhelmed. 

First, very few of the mutations receive functional validation 
in in vitro and in vivo studies. New mutations in known cancer 
genes are not necessarily functional in a given patient's disease, 
whereas the ones that are functional can add important new 
insights into the mechanism of pathogenesis in play, exemplified 
in the case of novel mutations in the extracellular domain of 
ERBB2 (Greulich et al. 2012). ERBB2 signaling is most commonly 
activated by amplification of the gene. These mutations sug- 
gested novel mechanisms of activation of ERBB2 signaling in 
lung cancer. 

Second, the beneficial value of functional follow-up in mouse 
models is recognized but still expensive and time consuming, al- 
though new approaches with genetically engineered mice may 
help (Heyer et al. 2010). 

Third, the tail of the mutation profiles (Fig. 3) grows, as 
study cohorts become larger, adding many novel significantly 
mutated genes to the cancer inventory. Computational ap- 
proaches that predict functional consequences of amino acid 
substitution have advanced significantly as more mutational data 
has accumulated (Radivojac et al. 2013), but further improve- 
ments are required. Knowing which mutations are likely to be 
deleterious will be extremely helpful in culling the list of muta- 
tions in cancer genes; though this may still be insufficient. Many 
of the proteins have multiple functions, some of which are un- 
known. So, functional testing will always be required. This is 
creating a bottleneck in discovery, which must be met with in- 
creasingly rapid and high-throughput methods for functional 
validation (Chin et al. 2011). 

The prediction of the functional consequences of chromo- 
somal-scale structural variation can be challenging. When in- 
dividual events disrupt gene structures, the interpretation is sim- 
plified. In most cases, however, genomic changes are difficult to 
interpret. Distinguishing functional from passenger mutations in 
the noncoding portions of the genome is an open frontier. The 
completion of ENCODE, which has identified myriad functional 
elements in DNA, brings an enormous resource to bear on the 
challenge of interpreting the functional role of variants outside 
coding sequences. Methods to make functional predictions are not 
yet in hand, although approaches that parallel coding sequence 
strategies are a starting point (Chapman et al. 2011) and leave 
a formidable challenge for functional follow-up. 

There have been substantial gains on all fronts from the war 
on cancer. The 10 years since the finishing of the human reference 
genome has been particularly fruitful, primarily as a consequence 



of the improved ability to identify and catalog somatic changes in 
cancer genomes. Looking forward, perhaps the greatest challenge 
now is translating discoveries into treatments based on strategies 
mapped out from patients' individual mutation profiles. The 
number of approved targeted cancer drugs today is still limited 
(Table 1A). By-and-large, these reflect targets that have been 
available for a long time: Three-fourths of the targeted genes were 
in the 2004 census. The list of drugs under development (Table IB) 
includes one-half newly discovered targets, suggesting the impact 
of discovery fostered by the genomic era is starting to make its 
mark. With the data generation continuing unabated, revealing 
even more detail of the working of the cancer cell, the list of 
targets can be expected to accelerate. We are realizing the vision of 
Dulbecco and benefiting tremendously from the effort to finish 
the human genome. 

Table 1 A. Cancer genes with FDA approved drugs 

Cancer 



Gene 




Approved Experimental 


Gene 


symbol 


Function 


drugs 


drugs 


Census 3 


EGFR 


Receptor tyrosine kinase 


7 


3 


Yes 


KDR 


Receptor tyrosine kinase 


2 


12 




KIT 


Receptor tyrosine kinase 


5 


2 


Yes 


ERBB2 


Receptor tyrosine kinase 


2 


0 


Yes 


FLT3 


Receptor tyrosine kinase 


2 


0 


Yes 


FLT4 


Receptor tyrosine kinase 


2 


0 


Yes 


FGFR1 


Receptor tyrosine kinase 


1 


4 


Yes 


FGFR2 


Receptor tyrosine kinase 


2 


2 


Yes 


FGFR3 


Receptor tyrosine kinase 


1 


0 


Yes 


PDGFRA 


Receptor tyrosine kinase 


3 


0 


Yes 


PDGFRB 


Receptor tyrosine kinase 


5 


0 


Yes 


NTRK1 


Receptor tyrosine kinase 


2 


0 


Yes 


RET 


Receptor tyrosine kinase 


1 


2 


Yes 


ABU 


Nonreceptor tyrosine 
kinases 


4 


7 


Yes 


ABL2 


Nonreceptor tyrosine 
kinases 


2 


1 


Yes 


LGK 


Nonreceptor tyrosine 
kinases 


1 


11 


Yes 


BRAF 


Serine/threonine kinases 


1 


3 


Yes 


AKT1 


Serine/threonine kinases 


2 


3 




PI Ml 


Serine/threonine kinases 


1 


20 


Yes 


RAF1 


Serine/threonine kinases 


1 


0 




PPARG 


Nuclear receptor 


13 


24 


Yes 


RARA 


Nuclear receptor 


6 


1 


Yes 


GREB1 


Transcription factor 


2 


0 




JUN 


Transcription factor 


3 


0 




NFKB2 


Transcription factor 


1 


0 




ALDH2 


Aldehyde dehydrogenase 


3 


3 




SDHB 


Succinate dehydrogenase 


1 


2 


Yes 


SDHG 


Succinate dehydrogenase 


1 


2 


Yes 


SDHD 


Succinate dehydrogenase 


2 


2 


Yes 


PIK3R1 


Phosphoinositide 
3-kinases 


1 


1 




AGSL3 


Isozyme 


1 


0 




GGND1 


Cyclin D1 


1 


0 


Yes 


GOL1A1 


Type 1 collagen 


1 


0 


Yes 


GMPS 


GMP synthase 


1 


0 


Yes 


RBI 


Tumor suppressor 


2 


0 


Yes 


ATIC 


Bifunctional enzyme 


2 


5 


Yes 


TOPI 


DNA topoisomerase 


4 


3 


Yes 


BGL2 


Suppresses apoptosis 


3 


0 


Yes 


TSHR 


Receptor for thyrothropin 


1 


0 


Yes 


SMO 


G protein-coupled 
receptor 


1 


0 


Yes 


GARS 


Cysteinyl-tRNA synthetase 


1 


0 




FGGR2B 


Regulation of immune 
response 


20 


0 





a Yes, if present in Cancer Gene Census (Futreal et al. 2004). 
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Table IB. Cancer genes with experimental drugs only 



Cancer 



Gene 




Experimental 


gene 
census' 


symbol 


Function 


drugs 


JAK1 


Nonreceptor tyrosine kinases 


2 




JAK2 


Nonreceptor tyrosine kinases 


5 


Yes 


JAK3 


Nonreceptor tyrosine kinases 


2 




SYK 


Nonreceptor tyrosine kinase 


5 




ID HI 


Methyltransferase inhibitor 


2 




IDH2 


Methyltransferase inhibitor 


1 




CREBBP 


Histone acetyltransferase 


1 


Yes 


MET 


Receptor tyrosine kinase 


9 


Yes 


ITK 


Intracellular tyrosine kinase 


1 




AKT2 


Serine-threonine protein kinase 


4 


Yes 


CDK4 


Cyclin-dependent kinases 


2 


Yes 


CDK6 


Cyclin-dependent kinases 


3 




COX6C 


Cytochrome c oxidase 


2 


Yes 


DDX6 


DEAD box helicase 


1 




PTPN11 


Protein tyrosine phosphatases 


1 


Yes 


TPR 


Activation of oncogenic kinases 


4 


Yes 


CANT1 


Calcium-dependent nucleotidase 


2 




HRAS 


Ras oncogene 


5 


Yes 


CTNNB1 


WNT signaling 


1 


Yes 


TP53 


Transcription factor, tumor 
suppressor 


1 


Yes 


MDM2 


p53-binding protein 


2 




MAF 


Transcription factor 


2 




MKL1 


Transcriptional coactivator 


1 




NCOA1 


Nuclear receptor coactivator 


21 




NCOA2 


Nuclear receptor coactivator 


23 


Yes 


IL2 


Interleukin 2 


7 




FHIT 


Purine metabolism 


3 




GNAS 


Guanine nucleotide-binding 
proteins 


2 


Yes 


CPHN 


Microtubule-associated protein 


2 


Yes 


MLLT4 


Adhesion, fusion partner of ALL-1 


1 


Yes 


PIK3CA 


Phosphoinositide-3-kinase, 
catalytic 


1 




PMS2 


DNA mismatch repair 


2 


Yes 


PPP2R1A 


Protein phosphatase, 
regulatory unit 


2 




PRKAR1A 


Protein kinase A, regulatory 
subunit 


3 


Yes 


KIF5B 


Kinesin-like protein 


1 




YWHAE 


Adapter protein 


1 




MYH11 


Myosins 


2 


Yes 



a Futreal et al. 2004. 
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